Harvesting Application Information for Industry-Scale Relational Schema Matching

نویسندگان

  • Nate Kushman
  • Fadel Adib
  • Dina Katabi
  • Regina Barzilay
چکیده

Consider the problem of migrating a company’s CRM or ERP database from one application to another, or integrating two such databases as a result of a merger. This problem requires matching two large relational schemas with hundreds and sometimes thousands of fields. Further, the correct match is likely complex: rather than a simple one-to-one alignment, some fields in the source database may map to multiple fields in the target database, and others may have no equivalent fields in the target database. Despite major advances in schema matching, fully automated solutions to large relational schema matching problems are still elusive. This paper focuses on improving the accuracy of automated large relational schema matching. Our key insight is the observation that modern database applications have a rich user interface that typically exhibits more consistency across applications than the underlying schemas. We associate UI widgets in the application with the underlying database fields on which they operate and demonstrate that this association delivers new information useful for matching large and complex relational schemas. Additionally, we show how to formalize the schema matching problem as a quadratic program, and solve it efficiently using standard optimization and machine learning techniques. We evaluate our approach on real-world CRM applications with hundreds of fields and show that it improves the accuracy by a factor of 2-4x.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

New Challenges in Data Integration: Large Scale Automatic Schema Matching

Today schema matching is a basic problem in almost every data intensive distributed application, namely enterprise information integration, collaborating web services, ontology based agents communication, web catalogue integration and schema based P2P database systems. There has been a plethora of algorithms and techniques researched in schema matching and integration for data interoperability....

متن کامل

New Challenges : Large Scale Automatic Semantic Integration

Today schema matching is a basic problem in almost every data intensive distributed application, namely enterprise information integration, collaborating web services, ontology based agents communication, web catalogue integration and schema based P2P database systems. There has been a plethora of algorithms and techniques researched in schema matching and integration for data interoperability....

متن کامل

An Approach for Matching Schemas of Heterogeneous Relational Databases

AbstrAct: Schema matching is a basic problem in many database application domains, such as data integration. The problem of schema matching can be formulated as follows, " given two schemas, S i and S j , find the most plausible correspondences between the elements of S i and S j , exploiting all available information, such as the schemas, instance data, and auxiliary sources " [24]. Given the ...

متن کامل

EITH - A Unifying Representation for Database Schema and Application Code in Enterprise Knowledge Extraction

The integration of heterogeneous legacy databases requires understanding of database structure and content. We previously developed a theoretical and software infrastructure to support the extraction of schema and business rule information from legacy sources, combining database reverse engineering with semantic analysis of associated application code (DRE/SA). In this paper, we present a compa...

متن کامل

Schema Matching Bibtex

The proposed matching system aims to discover in an automatic way, the correspondence links A survey of approaches to automatic schema matching. Comparing SSD-placement strategies to scale a database-in-the-cloud. Generic Schema Matching, Ten Years Later. Corpus-based Schema Matching. Importing Items via basic bibliographic formats (Endnote, BibTex, RIS, TSV, as for outputing them in appropriat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013